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Resource-Efficient  Digital  Communications: 

Research  and  Testbed  Development  in  Support  of  FFW  and  JTRS 

Executive  Summary 

The  focus  of  this  research  effort  has  been  to  develop  communication  technology  well- 
suited  to  tactical  military  communications  generally  and  the  FFW  and  JTRS  programs  specif¬ 
ically.  This  document  describes  the  following: 

•  The  design  and  simulation-based  testing  of  a  bandwidth-efficient  transceiver  suitable  for 
implementation  in  the  SLICE  radio  developed  by  ITT  Aerospace/Communications  Di¬ 
vision.  This  transceiver  uses  novel  error  control  coding  technology  developed  at  Notre 
Dame  in  conjunction  with  quadrature  amplitude  modulation  (QAM)  to  deliver  data  at  up 
to  3.75  bits  per  QAM  symbol  (not  including  pilots).  The  Notre  Dame  team  also  devel¬ 
oped  modules  carrying  out  synchronization,  channel  estimation,  and  equalization.  The 
resulting  transceiver  was  simulated  and  compared  with  comparable  results  for  a  com¬ 
peting  technology  -  continuous  phase  modulation  (CPM).  It  was  shown  that  the  power 
amplifier  backoff  required  for  QAM  puts  it  at  a  competitive  disadvantage  to  CPM  at 
spectral  efficiencies  where  both  are  feasible  -  i.e.,  below  2.0  bits/sec/Hz.  However, 
signal  processing  techniques  (such  as  predistortion  filtering)  for  QAM  signalling  are 
available  to  reduce  that  disadvantage;  moreover,  at  higher  spectral  efficiencies,  a  QAM- 
based  approach  may  be  the  only  feasible  solution. 

•  The  development  of  a  new  class  of  error  control  codes  called  low  density  parity  check 
(LDPC)  convolutional  codes.  These  are  convolutional  versions  of  LDPC  block  codes 
that  are  being  incorporated  into  a  variety  of  communication  standards.  It  has  been  shown 
that  LDPC  convolutional  codes  have  significant  performance  and  complexity  advan¬ 
tages  over  their  block  code  counterparts. 

•  New  techniques  for  reducing  the  peak-to-average  power  ratio  (PAPR)  for  Nyquist-filtered 
QPSK  signals.  Reducing  the  PAPR  relaxes  the  need  for  linear  power  amplifiers,  result¬ 
ing  in  less  expensive,  more  power-efficient  transceiver. 

•  Analysis  of  communication  systems  employing  adaptive  modulation,  in  which  aspects 
of  the  transmitted  signal  are  changed  automatically  depending  on  the  quality  of  the 
communication  channel.  Specifically,  we  analyzed  the  effect  of  imperfect  feedback 
-  i.e.,  the  effect  of  the  transmitter  getting  flawed  knowledge  about  the  channel  -  and 
proposed  practical  schemes  based  on  statistical  communication  theory. 


Development  of  synchronization  and  estimation  algorithms  well  suited  to  orthogonal 
frequency  division  multiplexing  (OFDM). 


1  Design  of  Bandwidth-Effi  cient  Mode  of  Operation  for  the 
SLICE  Radio  Platform 

1.1  Background  and  Motivation: 

The  Soldier  Level  Integrated  Communications  Environment  (SLICE)  is  a  radio  developed  by 
ITT  Aerospace/Communications  Division  of  Ft.  Wayne,  IN  under  contract  to  the  U.S.  Army. 
Normally,  SLICE  operates  in  a  spread-spectrum  mode;  however,  a  narrowband,  bandwidth- 
efficient  mode  of  operation  is  required  for  training  purposes.  Currently,  that  narrowband  mode 
of  operation  is  based  on  licensed  technology  provided  by  TrellisWare,  Inc.  of  Poway,  CA.  The 
TrellisWare  approach  uses  a  high-rate  channel  code  in  conjunction  with  continuous  phase 
modulation  (CPM)  to  effect  a  spectral  efficiency  up  to  slightly  more  than  2.0  bps/Hz. 

Under  this  contract,  Notre  Dame  developed  an  alternative  bandwidth-efficient  mode  of 
operation  based  on  quadrature  amplitude  modulation  (QAM)  and  low-complexity  turbo  codes. 
The  potential  advantages  of  this  approach  are  threefold:  (1.)  better  noise  immunity;  (2.) 
higher  possible  spectral  efficiency;  and  (3.)  no  requirement  to  pay  licensing  fees.  The  primary 
disadvantage  of  the  QAM-based  approach  lies  in  its  reliance  on  linear  high-power  amplifiers, 
which  are  typically  less  power-efficient  than  the  non-linear  amplifiers  used  in  CPM. 

This  section  of  the  report  describes  the  Notre  Dame  design  and  compares  a  QAM-based 
approach  with  a  CPM-based  approach. 

1.2  Low-Complexity  T\irbo  Codes  and  QAM 

The  structure  of  a  conventional  turbo  encoder  is  shown  in  Figure  1 .  In  such  a  structure,  incom¬ 
ing  data  is  encoded  twice  -  first  in  its  original  order  and  then  again  after  it’s  been  re-ordered 
-  i.e.,  “permuted”  or  “interleaved”.  (Throughout  this  document,  an  interleaver  is  designated 
as  a  subscripted  “7 r”.)  The  redundant  (parity)  bits  generated  by  each  of  the  two  encoders  are 
transmitted  along  with  the  original  data.  An  example  of  the  turbo  code  used  in  3G  cellular 
systems  in  shown  in  Figure  2. 

Figure  3  shows  a  low-complexity  turbo  encoder  designed  by  Massey  and  Costello  at  Notre 
Dame.  Notably,  there  are  four  (not  two)  constituent  encoders,  and  each  of  the  constituent 
encoders  contains  only  a  single  memory  element  -  i.e.,  they  are  all  two-state  constituent  codes. 
The  performance  of  these  low-complexity  has  been  shown  to  be  superior  to  that  of  the  3GPP 
code  [1]  at  lower  complexity  [2], 

The  Notre  Dame  design  employs  a  low-complexity  turbo  encoder  used  in  conjunction 
with  quadrature  amplitude  modulation  (QAM).  In  M- ary  QAM  signaling,  one  of  M  =  2b 
different  symbols  are  transmitted  during  each  symbol  period,  representing  b  bits.  These  M 
different  symbols  are  represented  as  points  on  a  two-dimensional  grid;  specifically,  the  point 
(s/,  sq)  represents  the  signal  with  in-phase  component  -S[  and  quadrature  component  sq  -  i.e., 
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Figure  1 .  A  conventional  turbo  encoder  structure. 


Figure  2.  An  example  of  a  conventional  turbo  encoder  -  i.e.,  the  turbo  encoder  used  in  3G 
cellular  systems. 
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Figure  3.  Low-complexity  turbo  encoder  designed  by  Notre  Dame  team, 
it  represents 

s(t)  =  si  cos(27t f0t)  -  sQ  sin(27 rfat) 

shifted  to  the  appropriate  time  interval.  (Here,  fa  is  the  carrier  frequency;  moreover,  s(t)  is 
typically  passed  through  a  “shaping  filter”  prior  to  transmission  to  reduce  the  bandwidth  of 
the  resulting  transmitted  signal.) 

Figure  4  shows  the  output  of  the  low-complexity  turbo  encoder  mapped  onto  the  16-QAM 
signal  set;  the  “puncturing  pattern”  indicates  that  encoded  bits  are  punctured  (or  deleted)  in 
three-bit  blocks;  the  pattern  “1 10”,  for  instance,  indicates  that  the  first  two  bits  are  transmitted 
and  the  third  is  punctured.  (Note  that  none  of  the  information  bits  are  transmitted  -  i.e., 
this  is  a  non-systematic  encoder.)  With  the  puncturing  of  bits  carried  out  as  indicated,  the 
turbo  encoder  in  Figure  4  has  a  rate  of  1/2  -  i.e.,  one  information  bit  per  two  bits  produced 
-  and  so  the  scheme  shown  has  a  spectral  efficiency  of  2  bits/symbol,  or  a  nominal  spectral 
efficiency  of  2.0  bits/sec/Hz.  (Note  that  if  the  shaping  filter  has  an  excess  bandwidth  parameter 
of  a,  then  the  actual  spectral  efficiency  of  M- ary  QAM  encoded  with  a  rate-f?  code  is  7  = 
f?log2(M)/(l  +  a)  bits/sec/Hz;  the  “nominal”  spectral  efficiency  assumes  a  =  0,  whereas 
more  realistic  values  would  be  0.1  <  a  <  0.35.) 
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Figure  4.  The  low-complexity  turbo  encoder  mapped  onto  16-QAM  modulation  signal  set. 


1.3  A  Description  of  the  Proposed  Architecture 

A  system-level  description  of  the  proposed  transceiver  is  shown  in  Figure  5.  It  consists  of  the 
following  components: 

•  A  “channel  encoder”  that  accepts  information  bits  and  produces  code  bits.  An  important 
parameter  of  the  encoder  is  its  rate  R;  if  it  produces  n  code  bits  for  every  k  information 
bits,  then  it  has  a  rate  R  =  k/n.  The  channel  encoder  in  the  Notre  Dame  design  is 
the  low-complexity  turbo  code  described  in  the  last  section  with  rates  R  =  1/2  and 
R  =  3/4. 

•  A  “bit  interleaver”  that  changes  the  order  of  the  bits  produced  by  the  channel  encoder. 
(Note  that  this  is  a  different  interleaver  that  is  internal  to  the  turbo  channel  encoder.) 
The  purpose  of  this  interleaver  is  to  “spread  out”  coded  bits  in  time  so  that  when  error 
“bursts”  occur  -  i.e.,  many  errors  in  close  temporal  proximity  -  the  effect  of  the  burst  is 
spread  out  over  the  coded  sequence. 

•  A  QAM  mapper  that  takes  the  output  of  the  bit  interleaver  and  uses  them  to  select  one 
of  M  —  2b  signals  to  transmit  during  the  associated  signaling  interval.  An  example  of 
this  is  shown  in  Figure  4. 

•  A  channel  that  exhibits  both  intersymbol  interference  (ISI)  and  additive  white  Gaussian 
noise  (AWGN),  represented  in  Figure  5  by  the  modules  labelled  “Fading  ISI  Channel” 
and  the  adder  that  combined  the  output  of  that  module  and  a  random  variable  z,  respec¬ 
tively. 
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Figure  5.  The  low-complexity  turbo  encoder  mapped  onto  16-QAM  modulation  signal  set. 

•  A  synchronization  module  that  performs  both  symbol-level  and  frame-level  synchro¬ 
nization  of  the  incoming  signal  -  i.e.,  it  uses  the  samples  of  the  downconverted  waveform 
to  estimate  the  timing  offset  associated  with  each  transmitted  symbol  and  the  boundaries 
of  the  modulated  frames. 

•  A  channel  estimation  algorithm  that  uses  the  “pilot”  (known)  symbols  embedded  in  each 
frame  to  estimate  the  effect  of  the  channel  on  the  unknown  (data-bearing)  symbols  in 
the  frame. 

•  An  equalizer  that  compensates  for  the  effects  of  the  channel,  using  the  channel  estimate 
provided  by  the  channel  estimation  module. 

•  A  “de-mapper”  that  demodulates  the  downconverted  signal  into  a  sequence  of  two- 
dimensional  points  (representing  the  transmitted  symbols  plus  noise  in  each  dimension) 
and  produces  a  sequence  of  “soft”  bit  values  indicating  the  confidence  (or  reliability) 
with  which  each  bit  has  been  detected. 

•  A  bit  deinterleaver  that  re-orders  the  transmitted  bits  into  the  order  they  were  in  prior  to 
bit-interleaving. 

•  Finally,  a  channel  decoder  that  uses  the  soft  bit  estimates  and  the  structure  provided  by 
the  turbo  encoder  to  provide  a  reliable  estimate  of  the  transmitted  data. 
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Figure  6.  The  hop  frame  structure. 

1.4  A  Suitable  Frame  Structure 

To  assess  the  performance  of  the  proposed  system,  an  appropriate  data  format  was  developed. 
Based  on  input  from  ITT  engineers,  a  frequency-hopping  architecture  was  assumed  -  i.e., 
one  frame  of  data  is  transmitted  at  a  particular  carrier  frequency,  and  then  the  next  frame  is 
transmitted  at  another  (pseudo-randomly  chosen)  carrier  frequency. 

A  frequency-hopping  architecture  requires  that  the  receiver  obtain  synchronization  and  a 
good  estimate  of  the  channel  within  each  frame;  this  in  turn  requires  that  every  frame  contain 
“pilot”  symbols  -  i.e.,  symbols  known  in  advance  to  both  transmitter  and  receiver.  After 
considerable  experimentation,  it  was  determined  that  25  pilot  symbols  were  adequate  to  carry 
out  these  functions,  so  a  frame  structure  that  embeds  25  pilot  symbols  in  the  middle  of  each 
frame  was  adopted.  (See  Figure  6.)  (The  pilot  symbols  are  embedded  in  the  middle  of  the 
frame  because  providing  known  information  in  the  middle  of  the  frame  provides  the  best 
estimate  of  the  channel  out  to  the  frame’s  edge.) 

Each  turbo  codeword  was  transmitted  over  either  eight  or  sixteen  hops;  this  has  the  ef¬ 
fect  of  “averaging  out”  the  channel  fading  over  an  entire  codeword.  A  total  of  eight  frame 
structures  were  used: 

•  8 -hop  and  16-hop  frames  for  use  with  16-QAM  and  a  rate- 1/2  channel  code; 

•  8-hop  and  16-hop  frames  for  use  with  16-QAM  and  a  rate-3/4  channel  code; 

•  8-hop  and  16-hop  frames  for  use  with  32-QAM  and  a  rate- 1/2  channel  code; 

•  8-hop  and  16-hop  frames  for  use  with  32-QAM  and  a  rate-3/4  channel  code. 

The  details  of  these  frame  structures  are  described  in  Figures  7-10.  We  briefly  consider 
the  contents  of  Figure  7: 

•  For  both  the  8-hop  and  16-hop  implementation,  there  are  8204  bits  in  each  turbo  code¬ 
word.  Each  turbo  codeword  represents  4096  data  bits  encoded  at  a  rate  of  1/2  with 
additional  “tail”  bits  added  onto  the  end  of  each  codeword  to  drive  the  encoder  into  a 
known  state. 


Structure 

8  hops 

16  hops 

Data/redundancy/tail  bits 

8204 

8204 

in  each  turbo  codeword 

Zero  padding  bits 

20 

52 

non-pilot  symbols  per  turbo  codeword 

(8204+20)/4  =  2056 

(8204+52)  /4  =  2064 

non-pilot  symbols  per  hop 

2056/8  =  257 

2064/16  =  129 

pilot  symbols  per  hop 

25 

25 

symbols  per  hop 

284 

156 

Figure  7.  Frame  details  for  systems  using  16-QAM  modulation  and  rate- 1/2  channel  codes. 


•  Zeroes  are  padded  into  each  frame  to  make  each  frame  in  an  8-hop  (or  16-hop)  turbo 
codeword  the  same  length. 

•  Symbols  from  the  16-QAM  alphabet  are  formed  by  taking  four  bits  at  a  time  to  form 
a  single  16-ary  symbol.  The  result  are  257  (129)  16-ary  symbols  in  each  of  the  eight 
(sixteen)  hops  making  up  one  codeword. 

•  Each  frame  is  loaded  with  25  pilot  symbols,  as  described  above.  Finally,  two  additional 
symbol  durations  are  added  to  each  frame  to  provide  time  for  synchronization.  The  net 
result  is  257  +  25  4-  2  =  284  (129  +  25  +  2  =  156)  symbols  in  each  frame. 


1.5  Simulations  -  Model  Description  and  Performance  Results 

Three  static  channel  models  were  provided  by  ITT  Aerospace/Communications;  they  are  de¬ 
scribed  in  Figure  11.  For  example,  Channel  1  describes  a  multipath  channel  with  three  path 
components  -  a  primary  (reference)  component,  a  secondary  component  with  received  power 
that  is  2  dB  below  the  primary  component  and  trails  it  by  0.2  /rs,  and  a  tertiary  component 
that  is  10  dB  below  the  primary  component  and  trails  it  by  0.4  /is.  Similarly,  Channels  2  and 
3  have  five  and  six  path  components,  respectively.  The  channels  are  “static”  in  that  Rayleigh 
fading  is  applied  to  each  path  but  the  fading  is  constant  over  a  single  hop  (frame);  that  is,  the 
channel  does  not  change  appreciably  during  the  time  required  to  transmit  a  single  frame  -  a 
reasonable  assumption  for  the  frame  rates  and  mobility  assumptions  being  made. 
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Figure  8.  Frame  details  for  systems  using  16-QAM  modulation  and  rate-3/4  channel  codes. 
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Figure  9.  Frame  details  for  systems  using  32-QAM  modulation  and  rate- 1/2  channel  codes. 
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Figure  10.  Frame  details  for  systems  using  32-QAM  modulation  and  rate-3/4  channel  codes. 
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+3 
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0.5 

-2 
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1.6 

-6 

5 

2.3 

-8 

6 
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Figure  1 1 .  Descriptions  of  the  three  static  channel  models  provided  to  Notre  Dame  by  ITT 
Aerospace/Communications  Division. 
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Not  including  pilots  Including  pilots 
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2.34 
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1.87 
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3.16 

3.75 

3 

2.38 

59.53 

1.87 

3.75 

3 

1.97 

49.23 

3.16 

Figure  12.  List  of  code/modulation  configurations  that  were  simulated. 


Figure  12  describes  the  16  different  code/modulation  combinations  that  were  simulated 
under  this  project.  They  range  in  spectral  efficiency  from  1.40  bits/sec/Hz  to  2.38  bits/sec/Hz 
when  the  effects  of  all  redundancy  -  error  control  coding  plus  pilots  -  is  included. 

For  comparison,  we  used  bandwidth-efficient  continuous  phase  modulation  (CPM)  schemes 
as  proposed  by  Chugg  et  al.  [3, 4].  This  is  accomplished  by  increasing  the  length  of  the  phase 
pulse  (thereby  introducing  intersymbol  interference)  and  reducing  the  modulation  index;  the 
net  result  is  an  increase  in  spectral  efficiency  at  the  cost  of  increased  complexity. 

The  various  bandwidth-efficient  modes  of  operation  were  compared  at  the  same  frame  er¬ 
ror  rate  of  1  %  -  an  operating  point  suggested  by  engineers  at  ITT  Aerospace/Communications 
Division.  (There  is  typically  another  high-layer  ARQ  protocol  implemented  on  top  of  the 
physical-layer  error  control  described  here.  When  a  frame  error  occurs  -  which  will  occur  in 
1%  of  the  frames  -  the  higher  layer  protocol  requests  a  re-transmission  of  the  affected  frame.) 

We  generated  two  different  sets  of  results  for  the  turbo-QAM  design  -  one  that  takes  into 
account  the  power  efficiency  loss  incurred  due  to  the  backoff  required  for  the  power  amplifier 
to  work  in  a  linear  region.  To  take  this  power  loss  into  account,  a  travelling  wave  tube  (TWT) 
amplifier  model  with  strong  AM/PM  conversion  was  employed  [5],  This  model  assumes  that 
an  input  signal  with  envelope  p(t )  produces  an  output  signal  with  envelope 
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Figure  13.  Comparison  of  turbo-coded  QAM  and  CPM  approach  to  bandwidth-efficient  com¬ 
munication. 

and  phase 


The  simulation  results  are  summarized  in  Figure  13.  Each  code/modulation  combination 
is  characterized  by  a  spectral  efficiency  (on  the  x-axis)  and  a  signal-to-noise  ratio  ( Eb/N0 ) 
required  to  obtain  a  frame  error  rate  of  1%  (on  the  y-axis).  The  points  labelled  “linear  amp” 
assume  that  a  linear  high-power  amplifier  with  no  power  penalty  is  available;  those  labelled 
“nonlinear  amp,  including  backoff’  make  a  more  realistic  assumption  -  that  linear  amplifica¬ 
tion  is  obtained  by  incurring  a  “backoff’  penalty  with  a  nonlinear  amplifier. 

The  following  observations  can  be  made  regarding  the  simulation  results: 

•  Turbo-coded  QAM  represents  a  way  to  increase  spectral  efficiency  on  the  communi- 
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cation  channels  of  interest  above  the  (approximately)  2.0-2.5  bits/sec/Hz  that  can  be 
provided  by  CPM.  Indeed,  with  higher  rate  turbo  codes  (and  QAM  constellations  of  64 
points  or  more),  spectral  efficiencies  of  3.5  bits/sec/Hz  are  possible. 

•  The  penalty  paid  for  amplifier  backoff  in  QAM  is  substantial  -  as  much  as  12  dB  under 
our  model  -  and  can  tip  the  balance  between  QAM  and  CPM  at  spectral  efficiencies 
where  both  are  potentially  viable. 

•  There  are  signal  processing  techniques  -  for  example,  predistortion  filtering,  in  which 
the  signal  is  distorted  prior  to  amplification  to  compensate  for  the  nonlinear  effects  of 
amplification  -  that  can  “buy  back”  (with  increased  complexity  costs)  some  of  the  power 
penalty  assumed  for  nonlinear  amplifiers  in  Figure  13.  Employing  such  techniques 
could  make  turbo-coded  QAM  more  competitive  with  CPM  at  spectral  efficiencies  be¬ 
low  2.5  bits/sec/Hz. 

2  Other  Supported  Research  Projects 

While  the  bandwidth-efficient  mode  for  SLICE  was  the  focal  point  of  the  Notre  Dame/ITT 
collaboration,  it  was  not  the  only  research  supported  under  award  DAAD16-02-C-0057.  In 
this  section,  we  briefly  describe  four  other  projects  that  were  carried  out  with  this  funding. 

2.1  LDPC  Convolutional  Codes 

Capacity-approaching  code  designs,  such  as  turbo  codes  and  low-density  parity-check  (LDPC) 
codes,  along  with  iterative  message-passing  decoding,  can  be  combined  with  quadrature  am¬ 
plitude  modulation  (QAM)  using  bit-interleaved  coded  modulation  (BICM)  to  provide  highly 
reliable,  power-  and  bandwidth-efficient  waveforms  for  SLICE-  and  FFW-type  radio  plat¬ 
forms.  As  an  alternative  to  the  more  conventional  turbo  codes  and  LDPC  block  codes,  we 
have  also  investigated  the  use  of  LDPC  convolutional  codes,  which  have  several  potential  ad¬ 
vantages  compared  to  the  standard  approaches.  Our  research  in  this  area  has  been  published 
widely  in  leading  technical  journals  and  conference  proceedings  [6,  7,  8,  9,  10,  11,  12].  Here 
we  briefly  summarize  some  of  the  more  recent  aspects  of  this  research. 

In  [13],  we  introduced  a  technique  for  “unwrapping”  regular  quasi-cyclic  LDPC  block 
codes  (QCLDPC-BCs)  to  form  regular  LDPC  convolutional  codes  (LDPC-CCs).  In  that  paper, 
we  used  an  unwrapping  technique  that  resulted  in  a  time-invariant  LDPC-CC  with  a  significant 
“convolutional  gain”  compared  to  the  underlying  QCLDPC-BC.  For  example,  a  0.9dB  gain 
at  a  BER  of  10-5  was  obtained  for  a  rate  2/5  LDPC-CC  obtained  by  unwrapping  a  [155,64] 
QCLDPC-BC.  The  decoding  graph  representation  of  the  two  codes  is  essentially  the  same, 
with  the  LDPC-CC  graph  being  obtained  by  replicating  the  QCLDPC-BC  graph  in  time.  Thus 
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Figure  14.  The  performance  of  LDPC  convolutional  codes  derived  from  quasi-cyclic  block 

LDPC  codes. 

iterative  message  passing  decoding  of  the  two  codes  has  the  same  computational  and  hardware 
complexity. 

In  [14],  we  introduced  a  new  unwrapping  technique  that  results  in  a  time-varying  LDPC- 
CC  with  significantly  improved  performance.  For  the  same  example  noted  above,  the  time- 
varying  unwrapping  results  in  an  additional  1.5-2.5dB  of  “convolutional  gain”  beyond  that 
achieved  by  the  time-invariant  unwrapping !  (See  Figure  14.)  This  very  surprising  result  comes 
from  unwrapping  the  same  original  QCLDPC-BC,  but  beginning  with  a  larger  circulant  size 
(block  length).  In  other  words,  the  original  QCLDPC-BC  operates  as  a  kind  of  protograph 
from  which  a  whole  set  of  LDPC-CCs  with  significant  “convolutional  gain”  can  be  derived. 

Further,  in  [15],  we  applied  the  new  unwrapping  techniques  to  irregular  LDPC-BCs  con¬ 
structed  at  Caltech’s  Jet  Propulsion  Laboratory  (JPL).  In  particular,  the  [2560, 1024]  (rate  2/5) 
JPL  ARA  code  and  the  punctured  [2048,  1 024]  (rate  1 12)  JPL  ARA  code  were  chosen  for  com¬ 
parison.  Our  results  show  that,  with  the  time-vaiying  unwrapping,  the  “convolutional  gain”  is 
about  l.OdB  in  the  rate  2/5  case  and  about  0.75dB  in  the  rate  1/2  case.  (See  Fig.  15.)  In  ad¬ 
dition,  we  applied  the  same  methods  again  to  the  irregular  LDPC-BC  JPL  protograph  codes. 
The  protograph  codes  selected  for  comparison  each  had  block  lengths  of  about  2500,  with 
code  rates  ranging  from  1/2  to  4/5.  Again,  using  the  time-varying  unwrapping,  “convolutional 
gains”  ranging  from  0.6dB  to  0.9dB  were  obtained.  (See  Figure  15.) 

This  report  briefly  summarizes  several  recently  discovered  methods  by  which  good  LDPC- 
BCs  can  be  unwrapped  to  obtain  LDPC-CCs  with  significant  “convolutional  gains”  and  es- 
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Figure  15.  LDPC  convolutional  codes  derived  from  JPL’s  ARA-based  codes  (left)  and 
protograph-based  codes  (right). 

sentially  the  same  decoding  complexity.  Among  the  benefits  of  these  LDPC-CCs  compared 
to  their  LDPC-BC  counterparts  are  the  following: 

•  Substantial  “convolutional  gains”  are  obtained  with  no  increase  in  decoding  computa¬ 
tional  or  hardware  complexity. 

•  The  codes  are  well  suited  for  streaming  data,  since  continuous  decoding  is  achieved 
naturally. 

•  For  framed  data,  a  large  variety  of  frame  lengths  can  be  realized  with  the  same  code 
simply  by  choosing  different  termination  lengths,  so  there  is  no  need  to  design  separate 
codes  for  applications  that  require  multiple  frame  lengths. 

•  A  natural  pipelined  decoding  architecture  can  be  used  to  achieve  high-speed  decoding. 

•  Low  power,  small  area,  and  high-speed  VLSI  implementations  with  minimal  routing 
congestion  can  be  realized  because  of  the  modularity  of  the  decoding  graph. 

•  A  simple  shift-register  based  systematic  encoding  circuit  can  always  be  implemented. 
Among  the  disadvantages  of  LDPC-CCs  are  the  following: 

•  The  improved  performance  comes  at  the  expense  of  increased  latency,  since  the  decod¬ 
ing  graph  is  extended  in  time. 

•  If  the  same  code  is  used  to  realize  a  variety  of  frame  lengths,  performance  will  be  sub¬ 
optimum  for  some  lengths. 
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•  Depending  on  the  code  memory,  it  may  be  difficult  to  efficiently  realize  short  frame 
lengths,  due  to  the  rate  loss  caused  by  termination. 

•  VLSI  decoder  implementations  can  be  quite  memory  intensive. 

Finally,  we  note  that  the  “convolutional  gains”  obtained  from  unwrapping  tend  to  diminish 
as  the  size  of  the  original  LDPC-BC  gets  large.  This  is  to  be  expected,  since  these  codes 
already  operate  quite  close  to  capacity. 

2.2  Peak-to-Average  Power  Ratio  Reduction  for  Nyquist-Filtered  QPSK 
Modulation 

Consider  the  root  raised-cosine  pulse  shaped  QPSK  modulation.  We  model  the  pulse  shaping 
filter  as  a  convolutional  encoder.  The  constraint  length  of  the  encoder  is  equal  to  the  truncating 
length  of  the  filter  minus  one,  and  the  outputs  are  the  pulse  shaped  QPSK  waveforms  for  a 
single  symbol  duration.  Therefore,  the  total  number  of  states  of  the  convolutional  encoder  is 
4 M  -  1,  where  M  is  the  truncation  length  of  the  filter.  Some  edges  of  the  trellis  produce 
zero-crossing  waveforms  and  some  produce  peaks.  By  pruning  such  edges,  i.e.,  eliminating 
the  state  transition  that  will  generate  large  peak  or  zero-crossing,  we  are  able  to  reduce  the 
peak-to-average  power  ratio  (PAPR)  significantly. 

Simulations  have  been  carried  out  for  pulse  shaping  filters  with  a  rolloff  factor  equal  to  0, 
0.1,  0.2  and  0.3  respectively.  In  these  simulations,  the  truncating  length  M  is  6  symbols  with 
10  samples  per  symbol.  PAPR  versus  pruning  percentage  is  shown  in  Figure  16. 

Assume  QPSK  modulation  over  an  additive  white  Gaussian  noise  (AWGN)  channel.  XN 
is  the  input  sequence  with  each  symbol  xk  drawn  from  the  QPSK  constellation.  YN  is  the 
AWGN  channel  output.  The  mutual  information  between  and  YN  can  be  calculated 
numerically  using  the  decision  feedback  aided  BCJR  algorithm.  The  simulation  results  are 
shown  in  Figure  17.  At  low  SNR,  the  capacity  loss  due  to  pruning  is  very  small.  Therefore, 
by  choosing  an  appropriate  coding  scheme,  we  can  reduce  the  PAPR  of  QPSK  without  much 
damage  to  capacity.  This  pruning  method  can  be  extended  to  all  M-ary  PSK  and  M-ary  QAM. 

2.3  Multilevel  Coding  for  Linear  ISI  channels 

For  communication  channels  affected  by  intersymbol  interference  (ISI),  we  propose  a  multi¬ 
level  coding  (MLC)  scheme  with  a  capacity  that  approaches  the  i.i.d.  Gaussian  input  capacity 
Ciid  of  the  ISI  channel  with  linear  complexity  in  channel  memory.  The  transmitter  applies 
MLC  and  linear  mapping,  in  which  M  independently  encoded  binary  streams  are  weighted 
and  summed  to  produce  the  sequence  of  the  channel  input  symbols.  The  number  M  of  binary 
streams  is  large  so  that  the  channel  input  has  a  Gaussian  distribution.  A  multistage  receiver 
performs  separate  successive  linear  minimum  mean  square  error  (LMMSE)  equalization  and 
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Figure  16.  PAPR  versus  pruning  percentage. 
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Figure  17.  Capacity  of  unpruned  and  10%,  30%,  and  50%  pruned  QPSK  modulation  assuming 
an  AWGN  channel. 


Figure  18.  Capacity  of  unpnined  and  10%,  30%,  50%  pruned  QPSK  modulation  under  an 
AWGN  channel. 

decoding  on  the  M  streams.  Once  a  stream  is  decoded,  its  effect  on  the  channel  output  can  be 
completely  removed.  At  the  same  time,  all  undecoded  streams  produce  a  non-white  additive 
Gaussian  Interference  to  the  current  steam  being  decoded.  By  making  M  large,  each  binary 
stream  operates  in  a  low  signal  to  interference  plus  noise  ratio  (SINR)  region.  More  impor¬ 
tantly,  the  LMMSE  equalizer  is  information  lossless  in  this  SINR  region.  The  overall  system 
is  not  only  computationally  efficient,  as  the  complexity  scales  linearly  with  the  channel  mem¬ 
ory  L  and  M,  but  also  optimal.  The  achievable  rate  of  the  MLC  scheme  with  equal  rate  power 
allocation,  where  the  power  is  allocated  in  such  a  way  that  each  layer  has  equal  achievable 
rate,  is  {dotted  in  Figure  17  together  with  Cm. 

2.4  Adaptive  Modulation 

Adaptive  transmission  systems  exploit  some  knowledge  about  the  communication  channel  to 
increase  the  system  capacity  and/or  to  enhance  the  communication  link’s  reliability.  Such 
systems  could  include  a  combination  of  adaptive  modulation  and  coding  (AMC),  multiple 
antenna  systems  including  spatial  multiplexing  and  transmit/receive  diversity  techniques,  and 
multiuser  scheduling. 

Traditionally,  research  tit  this  area  was  focused  on  the  impact  of  partial  or  noisy  channel 
state  information  (CSI)  at  the  receiver  in  outer  to  compute  the  throughput  and/or  BER  perfor¬ 
mance.  On  the  other  hand,  adaptive  transmission  techniques  require  channel  state  information 
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at  the  transmitter  (CSIT)  for  resource  allocation.  In  multiuser  systems,  each  user’s  CSI  is 
required  for  user  scheduling  in  order  to  satisfy  some  optimality  or  fairness  criterion.  AMC 
systems  require  CSI  to  select  the  appropriate  modulation  and  coding  rate  that  maximize  the 
spectral  efficiency.  Some  transmit  diversity  and  spatial  multiplexing  systems  require  CSI  to 
optimize  power  allocation  and  transmit  precoding,  while  broadcast  systems  require  CSI  for 
simultaneous  transmission  to  multiple  users.  As  a  result,  CSI  acquisition  at  the  transmitter  is 
an  active  area  of  research.  The  sources  of  noisy  CSI  include: 

•  Modelling  errors  in  the  channel  fading  statistics; 

•  Feedback  delay; 

•  Channel  estimation  error; 

•  Feedback  errors  on  the  reverse  channel; 

•  Bandwidth  constrained  feedback,  and 

•  Quantization  error. 

Much  of  tire  recent  research  on  noisy  CSIT  has  focused  on  feedback  delay,  channel  esti¬ 
mation  error  and  bandwidth-constrained  feedback  transmission.  In  most  of  the  literature  on 
adaptive  transmission  it  has  been  assumed  that  tire  feedback  channel  itself  is  error- free.  In  this 
research  project,  we  have  investigated  practical  feedback  channels  that  are  subject  to  feedback 
errors,  for  both  single-  and  multi-user  systems.  Our  results  are  summarized  below: 

1.  In  our  studies  on  tire  impact  of  imperfect  feedback  channels,  we  have  included  fading 
and  additive  noise  attributes  on  the  feedback  channel  model  considered.  We  compared 
ire  effects  of  CSI  imperfection  (fare  to  feedback  delay,  and  those  due  to  feedback  channel 
errors.  For  single-user  frequency  division  duplex  (FDD)  AMC  systems,  we  concluded 
that  feedback  errors  usually  have  a  significantly  greater  impact  on  system  performance 
(measured  by  bit-enor  rate,  or  BER). 

2.  We  proposed  a  practical  discrete  feedback  transfer  scheme,  where  system  performance 
now  depends  on  the  feedback  detection  schemes  employed.  We  first  analyzed  the  perfor¬ 
mance  of  various  feedback  detection  schemes,  including  maximum  likelihood  (ML)  and 
maximum  a  posteriori  (MAP)  receivers.  We  formulated  tire  problem  as  one  of  a  classi¬ 
cal  multiple  hypotheses  testing.  Conventional  wisdom  states  that  tire  average  feedback 
BER  for  a  MAP  receiver  is  upper-bounded  by  tire  average  feedback  BER  for  tire  ML 
receiver.  However,  this  performance  advantage  does  not  carry  over  to  the  AMS  spectral 
efficiency  and  is  only  true  for  low  to  moderate  average  signai-to-noise  ratio  (SNR).  We 
also  employed  the  Maile-Staie  Markov  Channel  (FSMC)  model  [16]  for  a  slowly  fad¬ 
ing  Rayleigh  channel  to  design  feedback  receivers.  He  FSMC-based  receiver  improves 
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the  performance  of  ML  detection  by  exploiting  the  one-step  transition  property  of  the 
FSMC  model. 


3.  We  extended  our  investigations  on  the  feedback  channel  to  multi-user  scheduling.  We 
showed  that  feedback  error  could  cause  not  only  erroneous  constellation  selection,  but 
also  erroneous  user  selection  because  of  quantization  error.  The  coupled  effect  of  quan¬ 
tization  and  feedback  errors  leads  to  an  increase  in  the  outage  region  in  the  high  SNR 
region. 

4.  We  also  investigated  the  tradeoff  that  exists  between  increasing  system  capacity  and  the 
cost  of  feedback  for  a  multi-user  system.  We  have  shown  that  though  multiple  users 
provide  another  form  of  diversity  in  fading  channels,  the  cost  of  feedback  limits  the 
number  of  users  that  may  be  served  at  any  one  time.  However,  less  power  may  be 
required  for  training  as  the  number  of  users  increases. 

5.  In  this  project,  we  also  considered  the  antenna  selection  problem  in  multiple-input, 
multiple-output  systems.  This  problem  addresses  the  tradeoff  that  exists  between  the 
spatial  diversity  benefit  of  multiple  antennas  and  the  cost  of  radio-frequency  (RF)  chains. 
It  is  concerned  with  finding  the  optimum  subset  of  RF  links  in  a  matrix  channel  with 
some  specified  performance  criterion  (e.g.,  capacity).  The  basic  idea  is  to  employ  a 
greater  number  of  antennas  compared  to  the  number  of  RF  chains,  and  then  select  a  sub¬ 
set  of  these  antennas.  An  optimum  solution  usually  requires  an  exhaustive  search  (over 
all  possible  combinations  of  subsets  of  antennas).  We  have  developed  a  sub-optimal 
fast  selection  algorithm  that  reduced  the  search  complexity  by  utilizing  the  channel’s 
temporal  fading  statistics,  i.e.,  the  channel’s  memory.  The  main  idea  here  is  that,  due 
to  channel  memory,  at  least  some  of  the  current  optimum  subset  of  antennas  will  still 
be  in  the  optimum  subset  at  the  next  selection.  Simulation  results  showed  that  our  al¬ 
gorithm  offers  a  complexity  reduction  by  a  factor  of  4  to  6,  compared  to  some  existing 
suboptimum  algorithms  [17]. 

2.5  MIMO  Processing  in  OFDM  Channels 

Orthogonal  frequency  division  multiplexing  (OFDM)  has  gained  a  great  deal  of  popularity  in 
recent  years  due  to  several  winning  advantages,  namely,  spectral  efficiency,  efficient  immu¬ 
nity  to  multipath  fading  as  well  as  noise,  and  flexibility  in  resource  allocation.  It  has  been  em¬ 
ployed  in  various  commercial  systems  that  include  wireless  local  area  networks  (WLAN/IEEE 
802.1  la/g/n  and  HIPERLAN/2),  wireless  metropolitan  area  networks  (WM AN/Wi M ax ,  IEEE 
802.16),  terrestrial  digital  audio  broadcasting  (DAB)  and  terrestrial  digital  video  broadcasting 
(DVB)  systems. 

However,  there  are  some  serious  drawbacks  of  OFDM.  The  most  notable  ones  are  high 
peak-to-average  power  ratio  (PAPR)  and  sensitivity  to  synchronization  errors.  In  this  part 
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of  the  project,  we  studied  the  issues  related  to  synchronization  errors.  We  also  investigated 
multiple-input  multiple-output  (MIMO)  OFDM  systems  including  synchronization  and  inter¬ 
ference  mitigation. 

The  synchronization  errors  of  OFDM,  which  include  timing  and  frequency  errors,  come 
from  two  sources:  the  local  oscillators  frequency  difference,  and  the  Doppler  spread  due  to 
the  relative  motion  between  the  transmitter  and  the  receiver.  Both  timing  and  frequency  errors 
introduce  extra  interference  (like  inter-channel  interference  or  ICI)  to  OFDM  systems  and  re¬ 
sult  in  performance  degradation.  In  particular,  timing  errors  have  two  major  effects  in  OFDM 
systems:  1)  it  causes  inter-symbol  interference  (ISI),  and  2)  it  degrades  the  performance  of 
channel  estimation.  If  the  samples  contained  in  one  received  OFDM  symbol  are  influenced 
by  more  than  one  transmitted  OFDM  symbol,  the  demodulated  signal  will  be  corrupted  by  ISI 
and  the  orthogonality  between  subcarriers  can  also  be  violated,  thereby  causing  ICI.  Both  ISI 
and  ICI  result  in  additional  disturbance  and  distortion  to  the  demodulated  signal  and  can  lead 
to  significant  degradation  in  system  performance.  In  a  coherent  OFDM  system,  timing  offset 
has  another,  possibly  even  more  severe,  impact  on  system  performance,  namely,  degradation 
of  the  performance  of  channel  estimation.  When  some  portions  of  the  effective  channel  are 
shifted  outside  the  channel  estimation  window  due  to  timing  offset,  the  channel  estimates  will 
have  additional  errors.  The  problem  is  even  more  pronounced  for  some  channel  estimators 
that  have  a  narrower  estimation  window  which  is  matched  to  the  channel  impulse  response 
(CIR).  Likewise,  frequency  offset  causes  ICI  and  destroys  orthogonality  among  sub-carriers 
in  OFDM  systems,  resulting  in  significantly  reduced  effective  signal-to-noise  ratio  (SNR).  Our 
results  are  summarized  below: 

1.  We  have  developed  a  fine  timing  synchronization  algorithm  that  employs  maximum- 
likelihood  estimation  (MLE)  method,  utilizing  estimated  channel  impulse  response  (CIR) 
at  the  receiver  to  obtain  improved  timing  performance.  The  estimated  CIR  at  the  receiver 
is  modeled  as  a  complex  Gaussian  random  process  parameterized  by  timing  offset.  This 
proposed  scheme  can  be  implemented  in  either  integer  timing  precision  or  in  real- valued 
timing  precision.  For  the  former,  the  MLE  is  simplified  as  a  power-delay-profile  (PDP)- 
based  correlation  scheme.  For  the  latter,  it  is  implemented  in  a  PDP-based  delay-locked- 
loop  structure.  We  have  performed  theoretical  and  numerical  investigations  on  the  pro¬ 
posed  scheme  which  is  shown  to  offer  significantly  improved  estimation  and  tracking 
performance  over  existing  schemes  (see,  e.g.,  [18]),  measured  by  the  correct  estimation 
probability. 

2.  Relevant  to  carrier  synchronization,  we  developed  a  carrier  frequency  offset  (CFO)  esti¬ 
mation  scheme  based  on  time-domain  channel  estimates.  We  showed  that  time-domain 
channel  estimates  retain  quite  well  the  CFO  information,  exhibited  in  the  form  of  phase 
rotation  of  the  estimated  channel  multipaths.  The  resulting  CFO  estimate  turns  out  to 
be  an  MLE  for  the  CFO.  Incorporating  the  Doppler  effect  of  the  fading  channel,  the 
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proposed  CFO  MLE  is  shown  to  achieve  significant  performance  gain  over  existing 
schemes  such  as  [19,  20]  with  reduced  complexity.  The  complexity  reduction  results 
from  the  fact  that  the  proposed  estimation  schemes  work  on  the  time-domain  channel 
estimate  whose  length  is  much  shorter  than  a  typical  FFT  size  in  OFDM.  Furthermore, 
the  proposed  CFO  estimator  does  not  require  a  specific  pilot  pattern  (like  some  schemes 
require  consecutive  pilot  symbols  to  be  identical). 

3.  We  have  extended  the  CFO  estimation  scheme  to  MIMO-OFDM.  Note  that,  for  MIMO 
systems,  the  power  of  additional  noise  due  to  synchronization  errors  increases  linearly 
with  the  number  of  transmitters.  Furthermore,  the  interference  due  to  synchronization 
errors  in  different  diversity  branches  of  the  receiver  originating  from  the  same  transmit¬ 
ted  symbols  is  correlated.  This  correlated  interference,  unlike  the  channel  noise,  can 
not  be  necessarily  abated  by  the  receiver  diversity.  Thus  its  impact  on  MEMO  decoding 
becomes  more  severe  (comparing  to  the  impact  of  channel  noise)  as  the  number  of  the 
receiver  antennas  increases.  Our  extended  scheme  also  enjoys  improved  synchroniza¬ 
tion  performance  with  reduced  complexity. 

4.  In  addition  to  the  above  investigations  on  synchronization  issues,  we  also  developed  an 
OFDM-based  multi-user  broadcast  scheme  that  features  optimum  subchannel  and  user 
allocation  (based  on  instantaneous  channel  conditions).  In  our  scheme,  assuming  that 
the  CSI  is  known,  the  subset  of  users  for  each  subcarrier  and  the  transmit  beamforming 
vector  are  optimized  (subject  to  the  transmit  power  constraint)  which  effectively  miti¬ 
gated  CCI  and  significantly  increased  the  data  throughput  rate  over  existing  schemes  that 
either  assigns  one  subcarrier  to  only  one  user  or  assign  each  subcarrier  to  all  users,  see, 
e.g.,  [21,  22].  The  proposed  scheme  efficiently  exploits  the  wireless  channels  through 
optimal  user  and  subcarrier  allocation  according  to  the  time-varying  nature  of  radio 
channels  and  multi-user  diversity. 

3  Development  of  the  Wireless  at  Notre  Dame  (WAND)  Lab 

Finally,  we  observe  that  a  significant  amount  of  the  funding  provided  under  award  DAAD16- 

02-C-0057  was  used  to  enhance  and  upgrade  the  Wireless  at  Notre  Dame  (WAND)  laboratory. 

Among  the  equipment  items  purchased  with  funding  from  this  award  were: 

•  An  Agilent  E4438C  vector  signal  generator; 

•  Two  Agilent  E4440A  spectrum  analyzers  and  an  Agilent  E8251A  signal  generator; 

•  An  HP  3589A  spectrum/network  analyzer; 

•  Two  Tektronix  WCA280A  wireless  communication  analyzers; 
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A  Tektronix  AWG430  arbitrary  waveform  generator; 

A  Tektronix  TDS7104  digital  oscilloscope  and  an  HP  54503 A  digital  oscilloscope. 
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